NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Detached Error Feedback for Distributed SGD with Random Sparsification

An Xu, Heng Huang (March 2022, International Conference on Machine Learning (ICML 2022))

Full Text Available
Coordinating Momenta for Cross-silo Federated Learning

An Xu, Heng Huang (February 2022, Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022))

Full Text Available
Pyramidal cell types drive functionally distinct cortical activity patterns during decision-making

https://doi.org/10.1038/s41593-022-01245-9

Musall, Simon; Sun, Xiaonan R; Mohan, Hemanth; An, Xu; Gluf, Steven; Li, Shu-Jing; Drewes, Rhonda; Cravo, Emma; Lenzi, Irene; Yin, Chaoqun; et al (January 2023, Nature Neuroscience)

Abstract Understanding how cortical circuits generate complex behavior requires investigating the cell types that comprise them. Functional differences across pyramidal neuron (PyN) types have been observed within cortical areas, but it is not known whether these local differences extend throughout the cortex, nor whether additional differences emerge when larger-scale dynamics are considered. We used genetic and retrograde labeling to target pyramidal tract, intratelencephalic and corticostriatal projection neurons and measured their cortex-wide activity. Each PyN type drove unique neural dynamics, both at the local and cortex-wide scales. Cortical activity and optogenetic inactivation during an auditory decision task revealed distinct functional roles. All PyNs in parietal cortex were recruited during perception of the auditory stimulus, but, surprisingly, pyramidal tract neurons had the largest causal role. In frontal cortex, all PyNs were required for accurate choices but showed distinct choice tuning. Our results reveal that rich, cell-type-specific cortical dynamics shape perceptual decisions.
more » « less
Full Text Available
On the Convergence of Communication-Efficient Local SGD for Federated Learning

Hongchang Gao, An Xu (February 2021, 35th AAAI Conference on Artificial Intelligence (AAAI 2021))

Federated Learning (FL) has attracted increasing attention in recent years. A leading training algorithm in FL is local SGD, which updates the model parameter on each worker and averages model parameters across different workers only once in a while. Although it has fewer communication rounds than the classical parallel SGD, local SGD still has large communication overhead in each communication round for large machine learning models, such as deep neural networks. To address this issue, we propose a new communicationefficient distributed SGD method, which can significantly reduce the communication cost by the error-compensated double compression mechanism. Under the non-convex setting, our theoretical results show that our approach has better communication complexity than existing methods and enjoys the same linear speedup regarding the number of workers as the full-precision local SGD. Moreover, we propose a communication-efficient distributed SGD with momentum, which also has better communication complexity than existing methods and enjoys a linear speedup with respect to the number of workers. At last, extensive experiments are conducted to verify the performance of our proposed methods.
more » « less
Full Text Available
Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

An Xu, Zhouyuan Huo (February 2021, 35th AAAI Conference on Artificial Intelligence (AAAI 2021))
null (Ed.)
Although the distributed machine learning methods can speed up the training of large deep neural networks, the communication cost has become the non-negligible bottleneck to constrain the performance. To address this challenge, the gradient compression based communication-efficient distributed learning methods were designed to reduce the communication cost, and more recently the local error feedback was incorporated to compensate for the corresponding performance loss. However, in this paper, we will show that a new "gradient mismatch" problem is raised by the local error feedback in centralized distributed training and can lead to degraded performance compared with full-precision training. To solve this critical problem, we propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis. Both our theoretical and empirical results show that our new methods can handle the "gradient mismatch" problem. The experimental results show that we can even train faster with common gradient compression schemes than both the full-precision training and local error feedback regarding the training epochs and without performance loss.
more » « less
Full Text Available

Search for: All records